Emma and Noah were the most popular baby names from 2011 to 2020.
Around 1958, there were close to 10,000 baby boys named Steve. There was a steady decline in the use of the name after that and still to the present day. Barbara was a very popular name for baby girls in the mid 1950s where close to 50,000 girls were given the name. Just like the name Steve for boys, the usage hit its peak and has been in a decline ever since.
The letter āAā seems to be the most popular first letter for girls, while the letter āJā is the most popular for boys.
---
title: "Project 1: Exploring 100+ Years of US Baby Names"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
---
```{r setup, include = FALSE}
library(tidyverse)
library(flexdashboard)
FILE_NAME <- here::here("data/names.csv.gz")
tbl_names <- readr::read_csv(FILE_NAME, show_col_types = FALSE)
knitr::opts_chunk$set(
fig.path = "img/",
fig.retina = 2,
fig.width = 6,
fig.asp = 9/16,
fig.pos = "t",
fig.align = "center",
# dpi = if (knitr::is_latex_output()) 72 else 150,
out.width = "100%",
# dev = "svg",
dev.args = list(png = list(type = "cairo-png")),
optipng = "-o1 -quiet"
)
ggplot2::theme_set(ggplot2::theme_gray(base_size = 8))
```
### Most Popular Female and Male Baby Names
```{r results = "hide"}
# PASTE BELOW >> CODE FROM question-1-transform
tbl_names_popular = tbl_names |>
# Keep ROWS for year > 2010 and <= 2020
filter(year > 2010, year <= 2020) |>
# Group by sex and name
group_by(sex, name) |>
# Summarize the number of births
summarize(
nb_births = sum(nb_births),
.groups = "drop"
) |>
# Group by sex
group_by(sex) |>
# For each sex, keep the top 5 rows by number of births
slice_max(nb_births, n = 5)
tbl_names_popular
```
```{r}
# PASTE BELOW >> CODE FROM question-1-plot BELOW
tbl_names_popular |>
# Reorder the names by number of births
mutate(name = fct_reorder(name, nb_births)) |>
# Initialize a ggplot for name vs. nb_births
ggplot(aes(x = nb_births, y = name)) +
# Add a column plot layer
geom_col() +
# Facet the plots by sex
facet_wrap(~ sex, scales = "free_y") +
# Add labels (title, subtitle, caption, x, y)
labs(
title = 'Most Popular Female and Male Baby Names',
subtitle = 'From 2011 - 2020',
caption = 'Source: United States Social Security Administration',
x = 'Number of Births',
y = 'Name'
) +
# Fix the x-axis scale
scale_x_continuous(
labels = scales::unit_format(scale = 1e-3, unit = "K"),
expand = c(0, 0),
) +
# Move the plot title to top left
theme(
plot.title.position = 'plot'
)
```
***
Emma and Noah were the most popular baby names from 2011 to 2020.
### Name Trends
```{r results = "hide"}
# PASTE BELOW >> CODE FROM question-2-transform
tbl_names_popular_trendy = tbl_names |>
# Group by sex and name
group_by(sex, name) |>
# Summarize total number of births and max births in a year
summarize(
nb_births_total = sum(nb_births),
nb_births_max = max(nb_births),
.groups = "drop"
) |>
# Filter for names with at least 10000 births
filter(nb_births_total > 1000) |>
# Add a column for trendiness computed as ratio of max to total
mutate(trendiness = nb_births_max/nb_births_total) |>
# Group by sex
group_by(sex) |>
# Slice top 5 rows by trendiness for each group
slice_max(trendiness, n = 5)
tbl_names_popular_trendy
```
```{r}
# PASTE BELOW >> CODE FROM question-2-visualize
plot_trends_in_name <- function(my_name) {
tbl_names |>
# Filter for name = my_name
filter(name == my_name) |>
# Initialize a ggplot of `nb_births` vs. `year` colored by `sex`
ggplot(aes(x = year, y = nb_births, color = sex)) +
# Add a line layer
geom_line() +
# Add labels (title, x, y)
labs(
title = glue::glue("Babies named {my_name} across the years!"),
x = 'Year',
y = 'Number of Births'
) +
# Update plot theme
theme(plot.title.position = "plot")
}
plot_trends_in_name("Steve")
plot_trends_in_name("Barbara")
```
***
Around 1958, there were close to 10,000 baby boys named Steve. There was a steady decline in the use of the name after that and still to the present day. Barbara was a very popular name for baby girls in the mid 1950s where close to 50,000 girls were given the name. Just like the name Steve for boys, the usage hit its peak and has been in a decline ever since.
### Distribution of Births By First Letter
```{r results = "hide"}
# PASTE BELOW >> CODE FROM question-3-transform-1 and question-3-transform-2
tbl_names = tbl_names |>
# Add NEW column first_letter by extracting `first_letter` from name using `str_sub`
mutate(first_letter = str_sub(name, 1, 1)) |>
# Add NEW column last_letter by extracting `last_letter` from name using `str_sub`
mutate(last_letter = str_sub(name, -1, -1)) |>
# UPDATE column `last_letter` to upper case using `str_to_upper`
mutate(last_letter = str_to_upper(last_letter))
tbl_names
tbl_names_by_letter = tbl_names |>
# Group by year, sex and first_letter
group_by(year, sex, first_letter) |>
# Summarize total number of births, drop the grouping
summarize(nb_births = sum(nb_births), .groups = "drop") |>
# Group by year and sex
group_by(year, sex) |>
# Add NEW column pct_births by dividing nb_births by sum(nb_births)
mutate(pct_births = nb_births/ sum(nb_births))
tbl_names_by_letter
```
```{r}
# PASTE BELOW >> CODE FROM question-3-visualize-1
tbl_names_by_letter |>
# Filter for the year 2020
filter(year == 2020) |>
# Initialize a ggplot of pct_births vs. first_letter
ggplot(aes(x = first_letter, y= pct_births)) +
# Add a column layer using `geom_col()`
geom_col() +
# Facet wrap plot by sex
facet_wrap(~ sex) +
# Add labels (title, subtitle, x, y)
labs(
title = 'Distribution of Births By First Letter For 2020',
x = 'Letter',
y = 'Percent of Births'
) +
# Fix scales of y axis
scale_y_continuous(
expand = c(0, 0),
labels = scales::percent_format(accuracy = 1L)
) +
# Update plotting theme
theme(
plot.title.position = "plot",
axis.ticks.x = element_blank(),
panel.grid.major.x = element_blank()
)
```
***
The letter "A" seems to be the most popular first letter for girls, while the letter "J" is the most popular for boys.